NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DIFFSERVE: EFFICIENTLY SERVING TEXT-TO-IMAGE DIFFUSION MODELS WITH QUERY-AWARE MODEL SCALING

Ahmad, Sohaib; Yang, Qizheng; Wang, Haoliang; Sitaraman, Ramesh; Guan, Hui (May 2025, Proceedings of the 8 th MLSys Conference)

Free, publicly-accessible full text available May 18, 2026
DIFFSERVE: EFFICIENTLY SERVING TEXT-TO-IMAGE DIFFUSION MODELS WITH QUERY-AWARE MODEL SCALING

Ahmad, Sohaib; Yang, Qizheng; Wang, Haoliang; Sitaraman, Ramesh; Guan, Hui (May 2025, Proceedings of the 8 th MLSys Conference)

Free, publicly-accessible full text available May 17, 2026
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling

Ahmad, Sohaib; Yang, Qizheng; Wang, Haoliang; Sitaraman, Ramesh; Guan, Hui (May 2025, Proceedings of the 8th MLSys Conference, Santa Clara, CA, USA, 2025)

Free, publicly-accessible full text available May 12, 2026
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

Xiang, Mingcan; Tang, Jiaxun; Yang, Qizheng; Guan, Hui; Liu, Tongping (October 2024, 2024 ACM Multimedia)

Full Text Available
GMorph: Accelerating Multi-DNN Inference via Model Fusion

Yang, Qizheng; Yang, Tianyi; Xiang, Mingcan; Zhang, Lijun; Wang, Haoliang; Serafini, Marco; Guan, Hui (April 2024, ACM)

AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy.
more » « less
Full Text Available
GMorph: Accelerating Multi-DNN Inference via Model Fusion

Yang, Qizheng; Yang, Tianyi; Xiang, Mingcan; Zhang, Lijun; Wang, Haoliang; Serafini, Marco; Guan, Hui (April 2024, ACM EuroSys'24)

AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy.
more » « less
Full Text Available

Search for: All records